21 research outputs found

    A Form of List Viterbi Algorithm for Decoding Convolutional Codes

    Get PDF
    Viterbi algorithm is a maximum likelihood decoding algorithm. It is used to decode convolutional code in several wireless communication systems, including Wi-Fi. The standard Viterbi algorithm gives just one decoded output, which may be correct or incorrect. Incorrect packets are normally discarded thereby necessitating retransmission and hence resulting in considerable energy loss and delay. Some real-time applications such as Voice over Internet Protocol (VoIP) telephony do not tolerate excessive delay. This makes the conventional Viterbi decoding strategy sub-optimal. In this regard, a modified approach, which involves a form of List Viterbi for decoding the convolutional code is investigated. The technique employed combines the bit-error correction capabilities of both the Viterbi algorithm and the Cyclic Redundancy Check (CRC) procedures. It first uses a form of ‘List Viterbi Algorithm’ (LVA), which generates a list of possible decoded output candidates after the trellis search. The CRC check is then used to determine the presence of correct outcome. Results of experiments conducted using simulation shows considerable improvement in bit-error performance when compared to classical approach

    HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

    Full text link
    Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.Comment: To appear in the Proceedings of the Sixth Workshop on Widening Natural Language Processing at EMNLP202

    PERCEIVED CORRELATION BETWEEN COMMUNICATION STYLES AND INTERPERSONAL CONFLICT RESOLUTION AMONG INTERNATIONAL STUDENTS IN MALAYSIA

    Get PDF
    Background and Purpose: A good and fulfilling relationship among individuals from distinct cultural backgrounds depends on effective communication. This research examined the perceived relationship between communication styles and interpersonal conflict resolution among international students in Malaysian universities.   Methodology: The study employed a cross-sectional survey in which self-developed structured questionnaires were used to gather data from a random sample of 324 international students in 15 higher institutions across Kuala Lumpur, Malaysia. The data were analyzed using multiple regression analysis.   Findings: The findings of this study revealed a significant positive relationship between communication styles and interpersonal conflict resolution among international students. Specifically, passive, passive-aggressive, and assertive communication styles have a significant positive relationship with conflict resolution. However, the aggressive communication style exerts an insignificant effect on conflict resolution with a t-value of 0.734 and a P-value of 0.463. Thus, the students generally believe this style does not help to resolve interpersonal conflict. These outcomes suggest the students’ readiness for cultivating a peaceful learning environment.   Contributions: This study provides relevant information that can help educational decision-makers to strengthen cross-cultural collaboration among international students in the Malaysian context. This valuable information can also facilitate successful academic, professional, and social cooperation.   Keywords: Cross-cultural relationship, interpersonal conflict resolution, communication styles, international students, Malaysia.   Cite as: Mohammed, S., Nasidi, Q. Y., Muhammed, M. U., Umar, M. M., & Hassan, I. (2023). Perceived correlation between communication styles and interpersonal conflict resolution among international students in Malaysia. Journal of Nusantara Studies, 8(2), 352-372. http://dx.doi.org/10.24200/jonus.vol8iss2pp352-37

    Deep Sequence Models for Text Classification Tasks

    Full text link
    The exponential growth of data generated on the Internet in the current information age is a driving force for the digital economy. Extraction of information is the major value in an accumulated big data. Big data dependency on statistical analysis and hand-engineered rules machine learning algorithms are overwhelmed with vast complexities inherent in human languages. Natural Language Processing (NLP) is equipping machines to understand these human diverse and complicated languages. Text Classification is an NLP task which automatically identifies patterns based on predefined or undefined labeled sets. Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection. In texts, some sequences of words depend on the previous or next word sequences to make full meaning; this is a challenging dependency task that requires the machine to be able to store some previous important information to impact future meaning. Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies. As such, we applied these models to Binary and Multi-class classification. Results generated were excellent with most of the models performing within the range of 80% and 94%. However, this result is not exhaustive as we believe there is room for improvement if machines are to compete with humans

    Semi-automatic approaches for exploiting shifter patterns in domain-specific sentiment analysis

    Get PDF
    This paper describes two different approaches to sentiment analysis. The first is a form of symbolic approach that exploits a sentiment lexicon together with a set of shifter patterns and rules. The sentiment lexicon includes single words (unigrams) and is developed automatically by exploiting labeled examples. The shifter patterns include intensification, attenuation/downtoning and inversion/reversal and are developed manually. The second approach exploits a deep neural network, which uses a pre-trained language model. Both approaches were applied to texts on economics and finance domains from newspapers in European Portuguese. We show that the symbolic approach achieves virtually the same performance as the deep neural network. In addition, the symbolic approach provides understandable explanations, and the acquired knowledge can be communicated to others. We release the shifter patterns to motivate future research in this direction

    HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

    Full text link
    This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.Comment: Accepted at ACL 2023 as a long paper (Findings

    AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

    Full text link
    Africa is home to over 2000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families annotated by native speakers. The data is used in SemEval 2023 Task 12, the first Afro-centric SemEval shared task. We describe the data collection methodology, annotation process, and related challenges when curating each of the datasets. We conduct experiments with different sentiment classification baselines and discuss their usefulness. We hope AfriSenti enables new work on under-represented languages. The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023 and can also be loaded as a huggingface datasets (https://huggingface.co/datasets/shmuhammad/AfriSenti).Comment: 15 pages, 6 Figures, 9 Table

    Humoral immunological kinetics of severe acute respiratory syndrome coronavirus 2 infection and diagnostic performance of serological assays for coronavirus disease 2019: an analysis of global reports

    Get PDF
    As the coronavirus disease 2019 (COVID-19) pandemic continues to rise and second waves are reported in some countries, serological test kits and strips are being considered to scale up an adequate laboratory response. This study provides an update on the kinetics of humoral immune response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and performance characteristics of serological protocols (lateral flow assay [LFA], chemiluminescence immunoassay [CLIA] and ELISA) used for evaluations of recent and past SARS-CoV-2 infection. A thorough and comprehensive review of suitable and eligible full-text articles was performed on PubMed, Scopus, Web of Science, Wordometer and medRxiv from 10 January to 16 July 2020. These articles were searched using the Medical Subject Headings terms 'COVID-19', 'Serological assay', 'Laboratory Diagnosis', 'Performance characteristics', 'POCT', 'LFA', 'CLIA', 'ELISA' and 'SARS-CoV-2'. Data from original research articles on SARS-CoV-2 antibody detection >= second day postinfection were included in this study. In total, there were 7938 published articles on humoral immune response and laboratory diagnosis of COVID-19. Of these, 74 were included in this study. The detection, peak and decline period of blood anti-SARS-CoV-2 IgM, IgG and total antibodies for point-of-care testing (POCT), ELISA and CLIA vary widely. The most promising of these assays for POCT detected anti-SARS-CoV-2 at day 3 postinfection and peaked on the 15th day; ELISA products detected anti-SARS-CoV-2 IgM and IgG at days 2 and 6 then peaked on the eighth day; and the most promising CLIA product detected anti-SARS-CoV-2 at day 1 and peaked on the 30th day. The most promising LFA, ELISA and CLIA that had the best performance characteristics were those targeting total SARS-CoV-2 antibodies followed by those targeting anti-SARS-CoV-2 IgG then IgM. Essentially, the CLIA-based SARS-CoV-2 tests had the best performance characteristics, followed by ELISA then POCT. Given the varied performance characteristics of all the serological assays, there is a need to continuously improve their detection thresholds, as well as to monitor and re-evaluate their performances to assure their significance and applicability for COVID-19 clinical and epidemiological purposes

    Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    Full text link
    With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.Comment: Accepted at TACL; pre-MIT Press publication versio

    MasakhaNEWS: News Topic Classification for African languages

    Full text link
    African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.Comment: Accepted to IJCNLP-AACL 2023 (main conference
    corecore